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Abstract 

We continue to explore the hypothesis that neuronal populations 
represent and process analog variables in terms of probability den- 
sity functions (PDFs). A neural assembly encoding the joint prob- 
ability density over relevant analog variables can in principle answer 
any meaningful question about these variables by implementing the 
Bayesian rules of inference. Aided by an intermediate representation 
of the probability density based on orthogonal functions spanning an 
underlying low-dimensional function space, we show how neural cir- 
cuits may be generated from Bayesian belief networks. The ideas and 
the formalism of this PDF approach are illustrated and tested with 
several elementary examples, and in particular through a problem in 
which model-driven top-down information flow influences the process- 
ing of bottom-up sensory input. 



1 Introduction 

1.1 Fundamental Hypothesis 



This is the second of two papers elaborating upon the proposition ( An- 



derson, 1994 ) that neural populations encode and process information 



about analog variables in the form of probability density functions 



(PDFs). As demonstrated in the first paper ( Barber et al., 2001 ), ex- 
plicit representation of probabilistic descriptors of the state of knowl- 
edge of physically relevant variables subserves a powerful strategy for 
modeling neural circuits. By exploiting mathematical tools developed 
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within the theory of Bayesian inference, we can estabhsh general pro- 
cedures for building and understanding models of cortical circuits that 
carry out well-posed information- processing tasks. 

Importantly, if neural populations encode the joint probability dis- 
tribution over the variables of interest, then such neural networks are 
able to answer any probabilistic question about those variables. Moti- 
vated by this fact, we shall now extend the PDF hypothesis by devising 
methods for embedding joint probabilities into neural networks. This 
will enable us to construct neural circuit models that pool multiple 
sources of evidence, such as sensory inputs and any evolutionarily de- 
termined priors on the joint distribution. More specifically, we will 
focus on developing neural networks that use "bottom-up" sensory 
inputs to build an internal model of the data, which in turn uses "top- 
down" signals to impose global regularities on the sensory data. In 
the resulting neural networks, there will naturally arise distinct feed- 
forward, feedback, and lateral connections, analogous to the neural 



pathways observed in the anatomy of the cerebral cortex (Van Essen 
et al., 1992| ) 



Our treatment is based on Bayesian belief networks ( JPearl, 1988 



[Smyth et al., 1997 ), graphical representations of probabilistic models 



that provide an efhcient means for organizing the relations between the 
random variables of a given model. The resulting neural networks will 
have several properties of Bayesian belief networks, as well as more 
typical neural-network properties, so we will call them neural belief 
networks. Bayesian belief network have been previously utilized in 
the genesis of certain neural networks ((Neal, 1992; [Zemel, 1999D ), 



although with different methods for generating the neural network ar- 
chitecture and dynamics. 

1.2 Three Levels of Representation 



In the context of the PDF hypothesis (Anderson, "199^ ), we assert that 



a physical variable x is described by a neural population at time t in 
terms of a PDF p{x;t), rather than as a single- valued estimate x{t). 
In general, we consider a PDF described at time t in terms of a set 
oi D parameters {^^j. Exp erimentally observed linear decoding rules 



(Georgopoulos et al., 1986; Schwartz, 1993) suggest that the PDFs 



may be represented as 

D 

p{x;t)^J2Mt)'^^^i^) (1) 

The basis functions ^fi{x) are orthonormal functions which serve to 
define the PDFs that the neural circuit can represent, but the ampli- 
tudes Afj, (i) cannot be interpreted as neuronal firing rates due to their 
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arbitrarily high precision and their abihty to take on negative values. 
Therefore, we introduce an additional representation of the PDF in 
terms of firing rates ai(t) and decoding functions (j)i{x) assigned to N 
neurons, so that 

N 

=^a.W'/'«(a;) (2) 

1=1 

Unlike the basis functions <&^(a;), the decoding functions (j)i{x) form a 
highly redundant, overcomplete representation {N ^ D) suitable for 
use with neuronal units having biologically realistic precision (some 
2-3 bits; see Rieke et al, 1997). 

The abstract representation defined by equation |l| will underlie the 
representation in the neuron space defined by equation ^j. This allows 
us to deal with the issue of how PDFs can be precisely implemented 
in populations of neurons by focusing on the mapping between the 



minimal space and the space of neurons (Barber et al., 2001). Thus, 
neural belief networks can be developed in the theoretically convenient 
abstract representation, and then be implemented in more realistic 
networks of low-precision model neurons. Adopting the terminology 
of Zemel et al. (1998), we denote the set of physical variables as the 
implicit space and the measurable quantities as the explicit space. Ex- 
tending their nomenclature, we denote the abstract space of equation]^ 
as the minimal space. The explicit space of neurons constitutes a bi- 
ological implementation of the desired computations in the implicit 
space, while the minimal space, whose properties are more conducive 
to formal analysis, provides a valuable bridge between the two other 
spaces. 

We have developed rules to transform between representations in 
the three spaces (Barber et al., 2001). Equations |l| and || respectively 



define how to transform representations in the minimal and explicit 
spaces into a representation (PDF) in the implicit space. With the 
remaining rules (summarized below) , we can readily switch between the 
three representations and approach any task in the most appropriate 
space. 

For the orthonormal basis {$^^(2;)}, the minimal space coefficients 
Af^(t) are found using the encoding rule 

Mi) = j ^t.{x)p{x;t)dx (3) 

The coefficients in the explicit space, i.e. the neural firing rates ai{t), 
cannot be found in this direct fashion. We utilize a set of encoding 
functions (j)i{x) and an encoding rule of the form 

a,{t) ^ f ( [ 4>,{x)p{x;t)dx] (4) 
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This explicit-space encoding rule is patterned after the neural responses 
associated with the population vector of Georgopoulos et al. (1986). 

To relate the explicit and minimal spaces, we express the encoding 
and decoding functions in terms of the orthonormal basis, so that 

D 

(l)i{x) = '^Hi,i^u{x) (5) 

v=l 
D 

(j)i{x) = '^kiu(^u{x) (6) 



The transformation coefficients n^i and kin also relate the A^{t) and 
ai{t), such that 



N 



= ^^^v^a^{t) (7) 



1=1 



ai{t) 



f{^k.Mt)^ (8) 



Methods for determining the encoding functions 4>i(^x)i decoding func- 
tions (f>i{x), and transformation coefficients K^i and ki^, have been ad- 



dressed in detail in the earlier work (Barber et al., 2001). 



2 Neural Belief Networks 
2.1 Bayesian Belief Networks 

In section |l.2| , we have summarized methods for encoding and de- 
coding probability density functions into and from the firing rates of 
populations of neurons. These methods are oriented towards encoding 
a single random variable (or vector), but we do not wish to restrict 
ourselves to only the simplest implicit spaces. In this work, we will 
explore ways in which we can apply the methods so far developed 



(Barber et al., 2001) to more complicated implicit spaces. In partic- 
ular, we will use Bayesian belief networks to efficiently organize the 
implicit random variables, and then use these Bayesian belief networks 
to generate neural networks. 

Bayesian belief networks are directed acyclic graphs that represent 
probabilistic models (Figure |l]). Each node represents a random vari- 
able, and the arcs (or directed line segments) signify the presence of 
direct causal influences between the linked variables. The strengths 
of these influences are defined using conditional probabilities. The di- 
rectionality of a specified link indicates the direction of causality (or, 
more simply, relevance); an arc points from direct cause to effect. 



4 




P(x|e-) 



X 



P(y|x) 




Y 



P(e|y) 




Figure 1: A chain-structured Bayesian belief network. Evidence e"*" and e~ 
from the two ends of the chain influences the belief in the random variables 
X and Y . In a straightforward terminology, X is referred to as the parent 
of Y and Y as the child of X. From the structure of the graph, we can see 
for example that Y is conditionally independent of e"*" given X; this is true 
regardless of the values of the links P[e~ |y), P{y\x), and P{x\e^). 



Bayesian belief networks have two properties that we will find very 
useful, both of which stem from the independencies shown by the graph 
structure. First, the value of a node X is not dependent upon all of 
the other graph nodes. Rather, it depends only on a subset of the 
nodes, called a Markov blanket of X, that separates node X from all 
the other nodes in the graph. The Markov blanket of interest to us 
can be readily determined from the graph structure. It is comprised 
of the union of the direct parents of X , the direct successors of X, 
and all direct parents of the direct successors of X. Second, the joint 
probability over the random variables is decomposable as 

n 

P(xi,X2,...,x„) = WP{x,\Va{xi)) (9) 
j=l 

where Va.{xi) refers to the (possibly empty) set of direct parent nodes 
of Xi. This decomposition comes about from repeated application of 
Bayes' rule and from the structure of the graph. 

2.2 Probabilistic Inference Performed by Neural Net- 
works 

Before exploring arbitrary Bayesian belief networks, it is enlightening 
to consider a BBN with a simple graph consisting of two connected 
nodes X — > Y . This graph represents any probabilistic model where 
a single random variable is inferred from one source of evidence. For 
convenience, we will work in the minimal space. 

Our objective is to find the best marginal PDFs p(y; t) and pix] t) 
to describe the system. We represent the PDFs using equation |l| and 

V 

In this network, we find values of the output firing rates {-Biy(i)} only 
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with input firing rates {A^{t)} assumed to be fully determined by an 
encoding process. 

We define a cost function by 

Ey = i y" (^p{y;t)- j p{y\x)p{x]t)ds^ dy 

= \j{Y.B,m,{y)~Y.Mt)jp^y\'^)^^^^'^)'^^ '^y 

(11) 

The cost function has a minimum corresponding to taking a weighted 
average of the conditional probability p{y \ x) 

Piv^i) = j p{y\x)p{x;t)dx (12) 

We assume in equation |l^ that the relationship between x and y is 
independent of the values of the underlying parameters A^(t) of the 
minimal space (Barber ct al., 2001). 

To minimize Ey, we calculate the derivatives dEy/dB^. Using gra- 
dient descent, we obtain the update rule 

dB^{t) _ dEy 



dt '^'dB, 



-il(^B,{t)-Y,^r^Mi)^ 



(13) 



where is a rate constant and we identify weights 

n^^ = JJ •^^{y)p{y\x)<^>^{x)dxdy (14) 

Equations |l^ and |lj define a network in the minimal space which can 
be converted to a neural network in the explicit space using equation |^. 
Equation |l^ has a fixed point at 

B^it) ff *.(2/)p(y I x)$^(x)da:dy (15) 

This is identical to the result obtained by encoding the inference rela- 
tion in equation |l^ into the minimal space using equation |[ 

2.3 Predictive and Retrospective Support 

Neural belief networks can feature two distinct types of information 
propagation that provide support for the PDFs represented at each 
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graph node. Predictive support, also called causal support, is proba- 
bilistic information that propagates, along the directions of the graph 



links, from cause to effect (Pearl, 1988). The network considered in 
section 2.2 involves only predictive support. 

The second type of support is retrospective, or diagnostic, support. 
In this case, information propagates against the directions of the graph 
l inks, from ef fect to cause, or, equivalently, from evidence to hypothesis 
( Pearl, 1988| ). By specifying p{y;t) instead of p{x;t), the X — > Y 



inference network features only retrospective support. 

Using the same minimal-space representation (equations |l| and [ic| ) 
that we used for the predictive network, we determine the update rule 
for the retrospective network. We find 



^B0it)np^-}^AJt)T^^\ (16) 

/3 a 



dt 

with feedback weights 

np^^ jj ■^f3{y)p{y\x)^^,{x)dxdy (17) 

and lateral weights 

Ta^ = j (^j p{y\x)-i>^{x)dx^ (^J p{y\x)^^{x)dx^ dy (18) 

While the feedback weights fi^^ of the retrospective network are iden- 
tical to the feedforward weights of the predictive network (equation [T^ ) 
in the minimal space, note that the resulting neural networks weights 
in the explicit space need not be identical. The lateral weights Tq^ 
provide a measure of the correlation between what the different ba- 
sis functions in the parent node X predict about the child node Y; 
these lateral connections act to ensure consistency between evidence 
and hypothesis. 

Although the networks driven by retrospective support are closely 
related to the networks driven by predictive support, their function 
is quite different. For example, consider a network with p{y \ x) = 
5 {y — \x\) as the underlying computation. In a predictive network, 
wherein we specify p{x; t) and infer p{y; t), the absolute- value relation- 
ship is approximated in a straightforward fashion (Figure |a). Con- 
versely, a retrospective network based upon the same conditional PDF 
is called upon to "invert" the absolute value, a non-invertible function. 
The inferred p{x; t) decoded from the retrospective network (Figure ^) 
captures both of the possible solutions — with positive and negative 
values — in response to a unimodal PDF p{y;t). 
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X, y 



Figure 2: Predictive and retrospective support of the absolute value func- 
tion. The results shown here utilize spaces of dimension D = 6 to represent 
both X and y in the two networks. The structure of the spaces was deter- 
mined from the singular value decomposition ( [Goldberg, 1991| ) of a discrete 
approximation of p{y \ x) — the basis functions ^^{x) and ^u{y) are set to 
the singular vectors corresponding to the D largest singular values, (a) The 
network with predictive support closely approximates the absolute value 
function. Specifying a PDF p{x\t) centered about x = —1/2 yields an in- 
ferred PDF p{y] t) of similar form centered about x = +1/2. (b) The network 
with retrospective support allows for both possible solutions. Specifying a 
unimodal PDF p{y;t) centered about x = +1/2 yields an inferred bimodal 
PDF p{x]t), with the modes centered about x = —1/2 and x = +1/2. 
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2.4 Encoding Bayesian Belief Networks Into Neu- 
ral Networks 



Following a strategy similar to that presented in section 2.2, we can 
develop neural-network update rules from arbitrary Bayesian belief 
networks. We assume that the Bayesian belief network consists of R 
nodes, symbolizing random variables Xi, X2, X3, . . . ,Xji. Introducing 
representations 

p(a:.;i) = 5]AW(t)<i>W(xO (19) 

for the marginal distributions in the minimal space, we define auxiliary 
cost functions, analogous to that in equation with the forms 

2 



E,, 




p(.;.)-/p(.|Pa(.))nP(^.;0^^. 



^AW(i)<i>«(^.)- 

p{x,\Pe.{x,))l[J2Ay;>my^^{x,)da 



dxi 



(20) 



where the index j for the product runs over the direct parents of Xi 
We further define an aggregate cost function 



E = J2KrE, 



(21) 



The parameters Ki can be used to emphasize particular portions of 
the network; except where otherwise noted, we assume that Ki ~ 1 for 
all i. Employing gradient descent to minimize E, we find 

(fe) fl 



dA 



dt 



_m_ 

dAi'^ 



(22) 



Since A^*^^ does not appear in all of the cost functions, dEi/dA^f^ is 
nonzero only for i = k and when the graph node for Xi is one of the 
direct children of the graph node for Xk- Thus, 



1 dA. 



(fe) 



rj dt 



dEk 



dA, 



(fe) 



E 



dE, 



ie{i:X.GCh(Xfc)} 



dA, 



(fe) 



(23) 



As was the case for the two-node inference network, the input PDF or 
PDFs will be specified by an encoding process rather than an update 
rule of this sort. 
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The derivatives in equation are straightforward but lengthy to 
evaluate. We omit the details of the calculation and simply state the 
results 

1 dAi^^ 



77 dt 



E^n n E^^^E^£^T^^^^-^.^....--M,u.'^(24) 



» 3 j'#fc 

where we have defined 



y" ^^^\xk)p{xk\V^{xi;))^^'^]{x,)dx,dxk (25) 



y" {x,)p{x, I Pa(x,))$i'^ (rcfe) n ^l-'? {x,)dx,dxdxk (26) 



and 




$(,'=Ha;fe)p(x,|Pa(x,)) ^f){xy)dxydxK 



p{x, I Pa(x,)) n i^3)dxj dx, (27) 



The sums over the index i in equation |2J run over the children of X^- 
The products are over the parents of cither node (equation |2^, first 
term in 24 ) or node Xi (equations ^ and |2^, second and third term 
in|2j), possibly excluding node Xk itself. From these equations, it can 
be seen that the PDF represented at a node Xi is updated based only 
on its direct parents, its direct descendents, and the direct parents of 
its direct descendents, so that Xi is separated from all other nodes in 
the neural belief networks by an appropriate Markov blanket. 

Equations |^ through |2^ can be awkward to use directly. It can 
be more convenient to write out the cost functions for a specific prob- 
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abilistic model using equation ^ and directly evaluate the necessary 
derivatives (specified by equation [2^). 



3 Applications of Neural Belief Networks 
3.1 Bidirectional Propagation 

So far, we have restricted our attention to developing neural networks 
that encode simple probabilistic models involving only a single source 
of evidence. Given suitable representations, we can design neural net- 



works that capture a wide variety of probabilistic relations (Barber 



et al., 2001). Further, by regarding representations of probabilistic de- 
pendence models as Bayesian belief networks, we have seen that two 
types of information propagation come into play: predictive and ret- 
rospective. 

In this section, we will examine several applications of neural belief 
networks. Unlike those considered before, the probabilistic models will 
now feature multiple sources of evidence, with bidirectional propaga- 
tion of both predictive and retrospective support. The corresponding 
neural networks thus have neurons with both feedforward and feedback 
connections, as well as lateral connections. 



3.2 Neural Propagation of Evidence in Trees 

To facilitate investigation of information propagation in neural belief 
networks, we first focus on networks whose implicit spaces are specified 
by tree-structured Bayesian belief networks (Figure ^a). These implicit 
networks are general enough to illustrate the concepts, while yielding 
neural networks that are readily understood. We will examine binary 
trees, where each node has at most two children, but the results extend 
simply to more general tree structures. 

We may assume that evidence is only available in the root node 
and the leaf nodes (although all such nodes need not provide evidence). 
The root node is the single node which has no father and is located at 
the top of the tree, while the leaf nodes are all the nodes which have 
no children. If another node X were externally specified, the subtree 
rooted at X could be broken off and treated separately. Conversely, 
the father node of X is unaffected by the descendants of X, so they 
could be deleted from the original tree, leaving X as a leaf node. 

Clearly, an unspecified root node can only receive retrospective sup- 
port, while an unspecified leaf node can only receive predictive support. 
All other unspecified nodes in the tree will receive both retrospective 
and predictive support. We will thus need to consider separately these 
three types of nodes when determining the update rules for the neural 
network. 
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b 




Figure 3: Tree-structured Bayesian belief networks, (a) In this tree, node 
A is called the root, while nodes D, F, G, and H are called the leaves. For 
trees, the direct parent of a node is called its father, and its direct children 
are called its sons. Since each father has at most two sons, the tree shown 
here is a binary tree, (b) A small tree. Any of the nodes in this tree can 
be specified as evidence. The two leaf nodes could also provide evidence: 
both leaves can provide information to the root, but if the root and one 
of the leaves is specified, the other leaf will only be driven by the root (its 
Markov blanket), (c) A tree-like graph with the arrows reversed. Any two of 
the nodes can be specified and its information will propagate throughout the 
network. Unlike the case of the tree, the Markov blanket of any node is both 
of the other nodes, (d) Chains are special cases of trees. This three-node 
chain can feature both predictive and retrospective support. 
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An unspecified root node X with children Y and Z receives feed- 
back inputs (retrospective support) from botli of tlicm (Figure |^). We 
introduce the representations 

p{x;t) = ^A„(i)$„(x) (28) 

a 

p{y-t) = ^i3^(O^M2/) (29) 
p{z-t) = Y.C^{t)Q,{z) (30) 



Following the procedures described in section |2.4| , the update rule for 
the root is 

1 dA,, dE„ ^, dE, 



. - -Ky^-K,^ (31) 



with 



dEy 
dA^ 



and 

dE, 
dA^ 



^A„(t) j (^j p{y\x)<^o.{x)d3^ (^j p{y\x)-^^{x)d^ dy 
-Y.BP^^) jj ■^Mp{y\x)'^^{x)dxdy (32) 



^Aa(t) j p{z\x)<^a{x)dx^ (^j p(z | x) $^ (x)da;^ dz 

-^C^{t) 1 1 Q^{z)p{z\x)^^{x)dxdz (33) 

Thus, the firing rates for the root node are driven by a sum of feedback 
inputs that individually are identical to the input produced by a single 
source of retrospective support (section |2!3|). The parameters Ky and 
Kz need not be the same; different values may be used to give greater 
significance to one of the inputs. 

The firing rates for an unspecified leaf node are also updated by a 
familiar rule. Consider a leaf node X with father U . Using equation^ 
and 

p{u;t)=Y,Ds{t)hs{u) (34) 

s 

we obtain the update rule 



7^ dt dA 
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= Af^it) -J^Dsit) / / As{u)pix\u)<i>^{x)dudx (36) 



where 

This update rule is of course identical to the upd ate rule for the X — > 
Y network with predictive support (section |2.2[ ) . 

All nonroot, nonleaf nodes have similar update rules. For a node 
X with father U and sons Y and Z, we impose the representations 
given above in equations En through pffl and equation R4l Applying the 



procedure of section 2.4 yields an update rule with form 



r? dt - "^^dA, "^'dA, "^^dA, ^^^^ 

The partial derivatives of the cost functions are identical to those eval- 
uated for the root and leaf nodes. The descendants and ancestors of 
X thus communicate, in the neural network, only through the inter- 
mediary of X itself. This is consistent with the tree structure of the 
underlying Bayesian belief network; given X, the descendants of X are 
conditionally independent of the ancestors of X ( Pearl, 198§| ). 



With these update rules, evidence provided at the root node or 
at leaf nodes will propagate throughout the network. The manner in 
which evidence is specified will depend on how the probabilistic model 
is posed. Therefore, the same graph could have different nodes specified 
for different purposes. For instance, the small tree in Figure ^ could 
have any of its nodes represent sensory inputs. 

If we specify the PDF for the root node, the leaf nodes Y and 
Z will receive predictive support from X. By selecting appropriate 
representations for the PDFs and for the conditional probabilities as- 
sociated with the links, the resulting neural network could subdivide 
a complex, highly general sensory input into simpler, more special- 
ized components. For example, a visual input at a particular retinal 
location might be separated into contrast and color. 

Conversely, if we specify PDFs for the leaf nodes, the root node X 
will receive retrospective support from Y and Z . Amongst other possi- 
bilities, this provides a simple way to model redundant sensory inputs. 
If we take the conditional probabilities to be of narrow Gaussian form, 
p{v I ^) = ^{yi^:<^y) and p{z I x) = N{z]x,a1), then the firing rates 
representing p{x] t) will be updated so as to pool the diagnostic infor- 
mation from both the sensory inputs, Y and Z . 

Similar arguments apply to larger trees. Additionally, larger trees 
may well have the root node and leaf nodes specified simultaneously 
(which is not of interest for the small tree discussed above). These 
nodes may correspond to sensory inputs, or can represent priors that 
are built into the neural network. 
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It is important to recognize that the neural update rules developed 
above apply only to the binary trees. If the arrows in the binary tree 
graphs are reversed, rather different neural update rules are produced. 
For example, the directions of the small tree shown in Figure ^ can 
be reversed, as shown in Figure ||c. Specifying Y and Z yields the 
multiplicative update rule 



rjKx dt 

+ Bfi{t)C^{t) jjj ^a{x)p{x I y, z)^i3{y)e^{z)dxdydz 

13,7 

(38) 

while specifying X and Z yields 

1 dB^jt) ^ 
rjKx dt 

^A„(<)C^(t) / // <^o,{x)p{x\y,z)^^{y)(d^[z)dxdydz 



a,7 



^ Bp{t)C^,{t)C^,{t) jUjj 



p{x I y, z)'^p{y)ld^^ {z)dydz 



p{x\y, z)^'^{y)Qy^{z)dydz^ ^ dx 

(39) 

This latter update rule features a feedforward term that is multilinear 
in the firing rates {Aa{t)} and {C^(t)}. It also features a nonlinear 
lateral combination of the firing rates {i3/3(t)} and {C.y{t)} which serves 
to ensure that the two parent nodes Y and Z are mutually consistent 
with the PDF of the child node X. If only X is specified, there will be 
an additional update rule for the firing rates {C-y(t)} that is similar in 



form to equation 39 



Although the update rules are more complicated with the directions 
reversed in this manner, the probabilistic model may demand it. For 
instance, the Bayesian belief network in Figure ^ is appropriate for 
implementing the arithmetic operations (add, subtract, multiply, and 
divide). 

An interesting application of these two types of neural belief net- 
works (tree and reversed-tree) is the estimation of the velocity of a 
moving object. A small tree (Figure ^) can generate two copies Y 
and Z of the input position X by taking the conditional probabilities 
to be Dirac delta functions S{y — x) and 5{z — x). We can set the 
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Time Step Time Step 

Figure 4: A neural belief network can estimate the velocity of a moving tar- 
get, (a) The position of the target is copied into two different populations 
of neurons, with different time delays, (b) The time delay and the difference 
of the two copies of position are used to estimate the velocity. The results 
shown here were obtained with the PDFs for each of the random variables 
represented using minimal spaces spanned by two straight-line basis func- 
tions. 



parameters Ki so that the values of the copies will be held for different 
lengths of time. In particular, we can establish the relations 

p{y;t) = p{x;t-T) (40) 
p{z-t) = p{x;t~2T) (41) 

These copies of the position can then be used as the inputs to a 
Bayesian belief network of the type shown in Figure By setting 
p{x \y,z) = 5{x — (y — z)/t), the velocity (given here by x) at time 
t — T can be estimated (Figure H). 

3.3 Top-Down Feedback Prom a High-Level Model 

In section ^.2| , we examined the neural update rules for Bayesian be- 
lief networks having binary-tree structure. By doing so, we examined 
the means by which information propagates throughout the network 
from one or more sources. In particular, we saw that conditional in- 
dependence in the Bayesian belief network was preserved in the neural 
network connectivity. 

We now turn from the general rules by which probabilistic informa- 
tion propagates in the neural network and investigate in more detail 
the effects that multiple sources of evidence have on the encoded PDFs. 
We consider PDFs encoded by chains of nodes with evidence provided 
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at one or both ends. (Specifying the PDF for any other node breaks 
the chain into two chains that can be treated independently.) A chain 
is a special case of a tree, so the neural update rules can be obtained 
as in section 



3.2. 



For the chosen demonstration, we employ representations of higher 
dimensionality than those generally used in the preceding develop- 
ments. For each graph node, the interval [—1,1] is covered with ten 
Gaussians (Figure |^), from which an orthonormal basis is generated. 
This makes it possible to represent a variety of PDFs, including mul- 
timodal distributions. 

The neural firing rate patterns in the explicit space are also assumed 
to be of Gaussian form. To eliminate any possibility of negative firing 
rates, the nonlinear activation function is taken to be rectification. 

We have already studied at some depth the behavior of chains con- 
sisting of two nodes (sections and 2_^). These chains are able to 
transmit probabilistic information from one node of the graph to an- 
other, with the accuracy limited by the representations adopted. For 
the predictive network, a feedforward input X drives the output Y. 

To keep the focus on the interaction of multiple sources of evidence, 
we take the conditional probability to be a Dirac delta function, p{y \ 
x) = 5{y — x). Of course, this defines a version of the communication 
channel that we have extensively considered, but the goal is now to 
duplicate the PDF rather than a particular numerical value. As seen 
in Figure the input PDF p{x] t) is copied with great fidelity by the 
output PDF p{v;t). 

Every chain with three or more nodes will admit both predictive 
and retrospective support. The behavior of all such chains is well char- 
acterized by a chain with three nodes: the update rule for any node 
depends only on the neighboring nodes, so a chain with three nodes 
covers all possibilities (only predictive support, only retrospective sup- 
port, and both predictive and retrospective support). Therefore, we 
consider the effect of adding a third node to the chain (Figure ^) . We 
set p{z I y) = 5{z — y) along with p{y \ x) — 5{y — x) and use identical 
parameters Ky and K^. 

The first possibility is just to add the third node without adding any 
additional evidence to the network. We decode p{y; t) and p(z; t) from 
neural firing rates determined using update rules derived previously 
for more general trees. For the same input PDF as above, p{y; t) is 
unchanged by the addition of the retrospective support from the third 
node, and the structure of p(z; t) is identical to that of p{y; t). 

Since the same evidence was presented to the network, the introduc- 
tion of a third node did not change the behavior of the original nodes. 
This is entirely appropriate, given the probabilistic foundations of the 
neural belief networks. However, it is feasible that, by adjusting the 
Ki parameters or the PDFs representable by the minimal spaces, there 
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Figure 5: Propagation of evidence in chain-structured neural belief networks, 
(a) Gaussian functions spanning the minimal spaces for the chain-structured 
neural belief networks under consideration. Orthonormal bases for the min- 
imal spaces are formed from these functions, (b) A neural belief network 
accurately transmits PDFs in a chain of two nodes. The decoded output 
PDF p{y;t) is very similar to the encoded input PDF p{x;t). (c) Multiple 
sources of evidence can help to resolve ambiguous information. Here, the 
inferior mode of a bimodal, bottom-up input p{x; t) is damped by a more 
specific top-down signal p{z;t). (d) A high-level model can be dynamically 
generated in a neural belief network with a population Z of winner-takc-all 
neurons. An ambiguous input signal at X is propagated to the winner- 
take-all neurons through Y. The winner-take-all neurons only respond to 
the stronger mode of the bimodal input, which damps the inferior mode in 
p{y,t). The functions p{y;t) and p{z;t) are only approximations to PDFs, 
since they can take on negative values. 
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can be a change in the dynamics of neural networks extended in this 
fashion. For example, appropriate parameter choice could produce a 
slowly varying PDF in Z that is relatively stable against noise, thus 
stabilizing a more rapidly varying PDF in Y . 

A more proactive role that the third node Z can play is as an 
additional source of evidence. We directly specify and encode 

it into the neural network. The neural firing rates for Y are driven 
by predictive support from X and retrospective support from Z, and 
then decoded to find p{y;t). One possible use of this second source 
of evidence is to resolve an ambiguous input; the inferior mode of a 
bimodal predictive input can be de-emphasized using more specific 
retrospective evidence (Figure 

Although this use of the retrospective evidence resolves the ambigu- 
ous predictive input, it does not explain how the retrospective evidence 
comes about. Ideally, we would like to build up a high-level model of 
the predictive input, and use this model to generate a top-down signal 
that imposes global regularity on the bottom- up predictive input. The 
uniform priors that we have assumed throughout this work are some- 
what of a hindrance for this purpose. With these priors, probabilities 
account well for multiple possibilities, but here we want to choose only 
one of those possibilities. 

In principle, we could restrict the allowed PDFs for p{z; t) by im- 
posing a prior on the PDFs and rederiving our neural update rules. 
However, we will adopt a more direct approach that illustrates the 
first steps towards implementing decision theory ( piaiffa, 196^ ) in neu- 
ral networks and take the neurons representing p{z; t) to be winner- 
take-all units. Only one of these neurons will be active at a time, based 
on a simple utility function: the neurons compete to be active, and the 
single neuron that is most strongly driven will be the one that is ac- 
tivated. The manner of implementation of the winner-take-all units is 
immaterial, so we directly choose the most strongly driven neuron in 
the computer simulations. (It is possible to implement a set of winner- 
take-all units in a real network through a suitable choice of nonlinear 
activation function and lateral weights. One may let each neuron in- 
hibit the others and have a self-excitatory connection. See Hertz et al., 
1991.) 

Since only the most strongly driven winner-take-all neuron is ac- 
tivated, this strategy provides us with a way to generate a high-level 
model that selects the dominant mode of a multimodal input distri- 
bution. We allow Z to be driven by the predictive support from y, 
and the winner-take-all nature of the Z neurons permits only narrow 
Gaussian PDFs to be represented. Thus, p(z; t) serves as our high- 
level model, and can resolve ambiguous inputs, as demonstrated in 
Figure ||d. This type of neural network could be used as the start- 
ing point for coherent theoretical accounts of attentive effects in the 
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primate visual system, of the electrosensory system of weakly electric 
fish, and of other neural systems where an internal model is built up 
to impose global constraints on neural representations of information. 



4 Conclusions 

We have extended the hypothesis that neural networks represent infor- 
mation as probability density functions. These PDFs are assumed to 
be obtainable by a linear combination of some implicit decoding func- 
tions, with the decoder for each neuron being weighted by its firing 
rate. The firing rates in turn are obtainable from the PDF using a 
complementary set of encoding functions. 

Success in representing an individual PDF with a population of 



neurons (Barber et al., 2001) has led us to inquire whether more com- 
plex probabilistic models can be represented and implemented in neu- 
ral terms. To this end, we have adopted graphical representations of 
probabilistic dependence models, in the form of Pearl's Bayesian belief 
networks, to organize and simplify the relations between the random 
variables in such a model. 

In summary, we have introduced, analyzed, and applied three ways 
to represent probabilistic information. These are the implicit model, 
depicted as a Bayesian belief network; the representation of the prob- 
abilistic model in the minimal space; and the representation of the 
probabilistic model as a neural network in the explicit space. Further, 
we have devised rules that permit us to convert one type of repre- 
sentation into another type. This formalization of the representation 
and processing of information in neurobiological computation therefore 
suggests a general protocol by which probabilistic models can be em- 
bedded in neural networks. First, we specify the probabilistic model, 
using a Bayesian belief network to organize the random variables. We 
then consider what sorts of functions are exemplars of the PDFs de- 
scribing the random variables and use these exemplars to define the 
minimal space. Finally, we utilize the relations between the minimal 
space and the explicit space to generate the neural network itself. 
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